Hungarian Dependency Treebank

نویسندگان

  • Veronika Vincze
  • Dóra Szauter
  • Attila Almási
  • György Móra
  • Zoltán Alexin
  • János Csirik
چکیده

Herein, we present the process of developing the first Hungarian Dependency TreeBank. First, short references are made to dependency grammars we considered important in the development of our Treebank. Second, mention is made of existing dependency corpora for other languages. Third, we present the steps of converting the Szeged Treebank into dependency-tree format: from the originally phrase-structured treebank, we produced dependency trees by automatic conversion, checked and corrected them thereby creating the first manually annotated dependency corpus for Hungarian. We also go into detail about the two major sets of problems, i.e. coordination and predicative nouns and adjectives. Fourth, we give statistics on the treebank: by now, we have completed the annotation of business news, newspaper articles, legal texts and texts in informatics, at the same time, we are planning to convert the entire corpus into dependency tree format. Finally, we give some hints on the applicability of the system: the present database may be utilized – among others – in information extraction and machine translation as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Evaluation of Automatic Conversion from Constituency to Dependency in Hungarian

In this paper, we investigate the differences between Hungarian sentence parses based on automatically converted and manually annotated dependency trees. We also train constituency parsers on the manually annotated constituency treebank and then convert their output to dependency trees. We argue for the importance of training on gold standard corpora, and we also demonstrate that although the r...

متن کامل

Dependency Parsing of Hungarian: Baseline Results and Challenges

Hungarian is a stereotype of morphologically rich and non-configurational languages. Here, we introduce results on dependency parsing of Hungarian that employ a 80K, multi-domain, fully manually annotated corpus, the Szeged Dependency Treebank. We show that the results achieved by state-of-the-art data-driven parsers on Hungarian and English (which is at the other end of the configurational-non...

متن کامل

Dependency Parsing for Identifying Hungarian Light Verb Constructions

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses. They often share their syntactic pattern with other constructions (e.g. verbobject pairs) thus LVC detection can be viewed as classifying certain syntactic patterns as light verb constructions or not. In this paper, we explore a...

متن کامل

Detecting Multiword Expressions by Dependency Parsing

In this poster, we present how different types of MWEs can be identified by dependency parsers in different languages. In our investigations, we focus on English verb-particle constructions (VPCs), Hungarian light verb constructions (LVCs) and German light verb constructions. In our experiments, we exploit the fact that some treebanks contain MWE-aware annotations, i.e. there are MWE-specific m...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010